Pesquisa | Portal Regional da BVS

TT3D: Leveraging precomputed protein 3D sequence models to predict protein-protein interactions.

Sledzieski, Samuel; Devkota, Kapil; Singh, Rohit; Cowen, Lenore; Berger, Bonnie.

Bioinformatics ; 39(11)2023 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-37897686

RESUMO

MOTIVATION: High-quality computational structural models are now precomputed and available for nearly every protein in UniProt. However, the best way to leverage these models to predict which pairs of proteins interact in a high-throughput manner is not immediately clear. The recent Foldseek method of van Kempen et al. encodes the structural information of distances and angles along the protein backbone into a linear string of the same length as the protein string, using tokens from a 21-letter discretized structural alphabet (3Di). RESULTS: We show that using both the amino acid sequence and the 3Di sequence generated by Foldseek as inputs to our recent deep-learning method, Topsy-Turvy, substantially improves the performance of predicting protein-protein interactions cross-species. Thus TT3D (Topsy-Turvy 3D) presents a way to reuse all the computational effort going into producing high-quality structural models from sequence, while being sufficiently lightweight so that high-quality binary protein-protein interaction predictions across all protein pairs can be made genome-wide. AVAILABILITY AND IMPLEMENTATION: TT3D is available at https://github.com/samsledje/D-SCRIPT. An archived version of the code at time of submission can be found at https://zenodo.org/records/10037674.

Assuntos

Proteínas , Software , Sequência de Aminoácidos , Proteínas/química

Cell-specific imputation of drug connectivity mapping with incomplete data.

Sapashnik, Diana; Newman, Rebecca; Pietras, Christopher Michael; Zhou, Di; Devkota, Kapil; Qu, Fangfang; Kofman, Lior; Boudreau, Sean; Fried, Inbar; Slonim, Donna K.

PLoS One ; 18(2): e0278289, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36795645

RESUMO

Drug repositioning allows expedited discovery of new applications for existing compounds, but re-screening vast compound libraries is often prohibitively expensive. "Connectivity mapping" is a process that links drugs to diseases by identifying compounds whose impact on expression in a collection of cells reverses the disease's impact on expression in disease-relevant tissues. The LINCS project has expanded the universe of compounds and cells for which data are available, but even with this effort, many clinically useful combinations are missing. To evaluate the possibility of repurposing drugs despite missing data, we compared collaborative filtering using either neighborhood-based or SVD imputation methods to two naive approaches via cross-validation. Methods were evaluated for their ability to predict drug connectivity despite missing data. Predictions improved when cell type was taken into account. Neighborhood collaborative filtering was the most successful method, with the best improvements in non-immortalized primary cells. We also explored which classes of compounds are most and least reliant on cell type for accurate imputation. We conclude that even for cells in which drug responses have not been fully characterized, it is possible to identify unassayed drugs that reverse in those cells the expression signatures observed in disease.

Assuntos

Reposicionamento de Medicamentos , Projetos de Pesquisa , Reposicionamento de Medicamentos/métodos

Topsy-Turvy: integrating a global view into sequence-based PPI prediction.

Singh, Rohit; Devkota, Kapil; Sledzieski, Samuel; Berger, Bonnie; Cowen, Lenore.

Bioinformatics ; 38(Suppl 1): i264-i272, 2022 06 24.

Artigo em Inglês | MEDLINE | ID: mdl-35758793

RESUMO

SUMMARY: Computational methods to predict protein-protein interaction (PPI) typically segregate into sequence-based 'bottom-up' methods that infer properties from the characteristics of the individual protein sequences, or global 'top-down' methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens is feasible for whole genomes, and thus these methods scale to settings where other methods (e.g. AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms. AVAILABILITY AND IMPLEMENTATION: https://topsyturvy.csail.mit.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Mapeamento de Interação de Proteínas , Proteínas , Sequência de Aminoácidos , Mapeamento de Interação de Proteínas/métodos , Proteínas/genética , Proteínas/metabolismo

GLIDER: function prediction from GLIDE-based neighborhoods.

Devkota, Kapil; Schmidt, Henri; Werenski, Matt; Murphy, James M; Erden, Mert; Arsenescu, Victor; Cowen, Lenore J.

Bioinformatics ; 38(13): 3395-3406, 2022 06 27.

Artigo em Inglês | MEDLINE | ID: mdl-35575379

RESUMO

MOTIVATION: Protein function prediction, based on the patterns of connection in a protein-protein interaction (or association) network, is perhaps the most studied of the classical, fundamental inference problems for biological networks. A highly successful set of recent approaches use random walk-based low-dimensional embeddings that tend to place functionally similar proteins into coherent spatial regions. However, these approaches lose valuable local graph structure from the network when considering only the embedding. We introduce GLIDER, a method that replaces a protein-protein interaction or association network with a new graph-based similarity network. GLIDER is based on a variant of our previous GLIDE method, which was designed to predict missing links in protein-protein association networks, capturing implicit local and global (i.e. embedding-based) graph properties. RESULTS: GLIDER outperforms competing methods on the task of predicting GO functional labels in cross-validation on a heterogeneous collection of four human protein-protein association networks derived from the 2016 DREAM Disease Module Identification Challenge, and also on three different protein-protein association networks built from the STRING database. We show that this is due to the strong functional enrichment that is present in the local GLIDER neighborhood in multiple different types of protein-protein association networks. Furthermore, we introduce the GLIDER graph neighborhood as a way for biologists to visualize the local neighborhood of a disease gene. As an application, we look at the local GLIDER neighborhoods of a set of known Parkinson's Disease GWAS genes, rediscover many genes which have known involvement in Parkinson's disease pathways, plus suggest some new genes to study. AVAILABILITY AND IMPLEMENTATION: All code is publicly available and can be accessed here: https://github.com/kap-devkota/GLIDER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional , Doença de Parkinson , Humanos , Biologia Computacional/métodos , Algoritmos , Proteínas/metabolismo

Bioengineered models of Parkinson's disease using patient-derived dopaminergic neurons exhibit distinct biological profiles in a 3D microenvironment.

Fiore, Nicholas J; Ganat, Yosif M; Devkota, Kapil; Batorsky, Rebecca; Lei, Ming; Lee, Kyongbum; Cowen, Lenore J; Croft, Gist; Noggle, Scott A; Nieland, Thomas J F; Kaplan, David L.

Cell Mol Life Sci ; 79(2): 78, 2022 Jan 19.

Artigo em Inglês | MEDLINE | ID: mdl-35044538

RESUMO

Three-dimensional (3D) in vitro culture systems using human induced pluripotent stem cells (hiPSCs) are useful tools to model neurodegenerative disease biology in physiologically relevant microenvironments. Though many successful biomaterials-based 3D model systems have been established for other neurogenerative diseases, such as Alzheimer's disease, relatively few exist for Parkinson's disease (PD) research. We employed tissue engineering approaches to construct a 3D silk scaffold-based platform for the culture of hiPSC-dopaminergic (DA) neurons derived from healthy individuals and PD patients harboring LRRK2 G2019S or GBA N370S mutations. We then compared results from protein, gene expression, and metabolic analyses obtained from two-dimensional (2D) and 3D culture systems. The 3D platform enabled the formation of dense dopamine neuronal network architectures and developed biological profiles both similar and distinct from 2D culture systems in healthy and PD disease lines. PD cultures developed in 3D platforms showed elevated levels of α-synuclein and alterations in purine metabolite profiles. Furthermore, computational network analysis of transcriptomic networks nominated several novel molecular interactions occurring in neurons from patients with mutations in LRRK2 and GBA. We conclude that the brain-like 3D system presented here is a realistic platform to interrogate molecular mechanisms underlying PD biology.

Assuntos

Neurônios Dopaminérgicos/patologia , Doença de Parkinson/patologia , Bioengenharia , Técnicas de Cultura de Células em Três Dimensões , Células Cultivadas , Neurônios Dopaminérgicos/citologia , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Células-Tronco Pluripotentes Induzidas/patologia , Neurogênese , Seda/química , Alicerces Teciduais/química

MUNDO: protein function prediction embedded in a multispecies world.

Arsenescu, Victor; Devkota, Kapil; Erden, Mert; Shpilker, Polina; Werenski, Matthew; Cowen, Lenore J.

Bioinform Adv ; 2(1): vbab025, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36699351

RESUMO

Motivation: Leveraging cross-species information in protein function prediction can add significant power to network-based protein function prediction methods, because so much functional information is conserved across at least close scales of evolution. We introduce MUNDO, a new cross-species co-embedding method that combines a single-network embedding method with a co-embedding method to predict functional annotations in a target species, leveraging also functional annotations in a model species network. Results: Across a wide range of parameter choices, MUNDO performs best at predicting annotations in the mouse network, when trained on mouse and human protein-protein interaction (PPI) networks, in the human network, when trained on human and mouse PPIs, and in Baker's yeast, when trained on Fission and Baker's yeast, as compared to competitor methods. MUNDO also outperforms all the cross-species methods when predicting in Fission yeast when trained on Fission and Baker's yeast; however, in this single case, discarding the information from the other species and using annotations from the Fission yeast network alone usually performs best. Availability and implementation: All code is available and can be accessed here: github.com/v0rtex20k/MUNDO. Supplementary information: Supplementary data are available at Bioinformatics Advances online. Additional experimental results are on our github site.

GLIDE: combining local methods and diffusion state embeddings to predict missing interactions in biological networks.

Devkota, Kapil; Murphy, James M; Cowen, Lenore J.

Bioinformatics ; 36(Suppl_1): i464-i473, 2020 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-32657369

RESUMO

MOTIVATION: One of the core problems in the analysis of biological networks is the link prediction problem. In particular, existing interactions networks are noisy and incomplete snapshots of the true network, with many true links missing because those interactions have not yet been experimentally observed. Methods to predict missing links have been more extensively studied for social than for biological networks; it was recently argued that there is some special structure in protein-protein interaction (PPI) network data that might mean that alternate methods may outperform the best methods for social networks. Based on a generalization of the diffusion state distance, we design a new embedding-based link prediction method called global and local integrated diffusion embedding (GLIDE). GLIDE is designed to effectively capture global network structure, combined with alternative network type-specific customized measures that capture local network structure. We test GLIDE on a collection of three recently curated human biological networks derived from the 2016 DREAM disease module identification challenge as well as a classical version of the yeast PPI network in rigorous cross validation experiments. RESULTS: We indeed find that different local network structure is dominant in different types of biological networks. We find that the simple local network measures are dominant in the highly connected network core between hub genes, but that GLIDE's global embedding measure adds value in the rest of the network. For example, we make GLIDE-based link predictions from genes known to be involved in Crohn's disease, to genes that are not known to have an association, and make some new predictions, finding support in other network data and the literature. AVAILABILITY AND IMPLEMENTATION: GLIDE can be downloaded at https://bitbucket.org/kap_devkota/glide. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Saccharomyces cerevisiae , Difusão , Humanos , Mapeamento de Interação de Proteínas

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA